Goto

Collaborating Authors

 rademacher complexity



Supplementary Material

Neural Information Processing Systems

The supplementary material is organized as follows. We give details of the definitions and notation in Section B.1 . Then, we provide the technical details of the lower bound (Lemma 3.3). In Section D.4 we provide insights into auto-labeling using This suggests, in these settings auto-labeling using active learning followed by selective classification is expected to work well. This idea is captured by the Chow's excess risk [ Nevertheless, it would be interesting future work to explore the connections between auto-labeling and active learning with abstention.







Hypervolume Maximization: A Geometric View of Pareto Set Learning

Neural Information Processing Systems

This paper presents a novel approach to multiobjective algorithms aimed at modeling the Pareto set using neural networks. Whereas previous methods mainly focused on identifying a finite number of solutions, our approach allows for the direct modeling of the entire Pareto set. Furthermore, we establish an equivalence between learning the complete Pareto set and maximizing the associated hypervolume, which enables the convergence analysis of hypervolume (as a new metric) for Pareto set learning. Specifically, our new analysis framework reveals the connection between the learned Pareto solution and its representation in a polar coordinate system. We evaluate our proposed approach on various benchmark problems and real-world problems, and the encouraging results make it a potentially viable alternative to existing multiobjective algorithms.


Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu

Neural Information Processing Systems

Langevin dynamics (MFLD) (Mei et al., 2018; Hu et al., 2019) is particularly attractive due to the MFLD arises from a noisy gradient descent update on the parameters, where Gaussian noise is injected to the gradient to encourage "exploration". Furthermore, uniform-in-time estimates of the particle discretization error have also been established (Suzuki et al., The goal of this work is to address the following question.


5927edd18c5dd83aa8936a4610c72029-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we examine our theoretical results with controlled experiments via synthetic data. We do not have a complete explanation for such spikes. At first glance, overfitting could happen when the number of linear measurements is less than the size of the groundtruth matrix. Moreover, when the measurements satisfy RIP, Li et al. Soltanolkotabi [ 45 ] show that GD exactly recovers the ground truth. To our best knowledge, most existing generalization analysis for flat regularization are for two-layer models, e.g., Li et al.